Search CORE

217 research outputs found

Graphle: Interactive exploration of large, dense graphs

Author: Huttenhower Curtis
Mehmood Sajid O
Troyanskaya Olga G
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background A wide variety of biological data can be modeled as network structures, including experimental results (e.g. protein-protein interactions), computational predictions (e.g. functional interaction networks), or curated structures (e.g. the Gene Ontology). While several tools exist for visualizing large graphs at a global level or small graphs in detail, previous systems have generally not allowed interactive analysis of dense networks containing thousands of vertices at a level of detail useful for biologists. Investigators often wish to explore specific portions of such networks from a detailed, gene-specific perspective, and balancing this requirement with the networks' large size, complex structure, and rich metadata is a substantial computational challenge. Results Graphle is an online interface to large collections of arbitrary undirected, weighted graphs, each possibly containing tens of thousands of vertices (e.g. genes) and hundreds of millions of edges (e.g. interactions). These are stored on a centralized server and accessed efficiently through an interactive Java applet. The Graphle applet allows a user to examine specific portions of a graph, retrieving the relevant neighborhood around a set of query vertices (genes). This neighborhood can then be refined and modified interactively, and the results can be saved either as publication-quality images or as raw data for further analysis. The Graphle web site currently includes several hundred biological networks representing predicted functional relationships from three heterogeneous data integration systems: <it>S. cerevisiae </it>data from bioPIXIE, <it>E. coli </it>data using MEFIT, and <it>H. sapiens </it>data from HEFalMp. Conclusions Graphle serves as a search and visualization engine for biological networks, which can be managed locally (simplifying collaborative data sharing) and investigated remotely. The Graphle framework is freely downloadable and easily installed on new servers, allowing any lab to quickly set up a Graphle site from which their own biological network data can be shared online.</p

Crossref

Harvard University - DASH

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Recommended from our members

PILGRM: an interactive data-driven discovery platform for expert biologists

Author: Avraham
Bolstad
C. S. Greene
Calviello
Chikina
Dai
Faith
Gautier
Gentleman
Giaever
Harsha
Hess
Hibbs
Irizarry
Nielsen
O. G. Troyanskaya
Santos
Tedder
Yan
Publication venue: Oxford University Press
Publication date: 01/01/2011
Field of study

PILGRM (the platform for interactive learning by genomics results mining) puts advanced supervised analysis techniques applied to enormous gene expression compendia into the hands of bench biologists. This flexible system empowers its users to answer diverse biological questions that are often outside of the scope of common databases in a data-driven manner. This capability allows domain experts to quickly and easily generate hypotheses about biological processes, tissues or diseases of interest. Specifically PILGRM helps biologists generate these hypotheses by analyzing the expression levels of known relevant genes in large compendia of microarray data. Because PILGRM is data-driven, it complements a user’s knowledge and literature analysis with mining of diverse functional genomic data, thereby generating novel predictions that can drive experimental follow-up. This server is free, does not require registration and is available for use at http://pilgrm.princeton.edu

Princeton University Open Access Repository

Crossref

PubMed Central

MADAM - An open source meta-analysis toolbox for R and Bioconductor

Author: Armin Graber
DR Rhodes
F Hong
I Borozan
JD Storey
JK Choi
Karl G Kugler
Laurin AJ Mueller
O Troyanskaya
RA Fisher
RC Gentleman
VG Tusher
Y Moreau
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

Nearest Neighbor Networks: clustering expression data based on gene neighborhoods

Author: Coller Hilary A
Flamholz Avi I
Hibbs Matthew A
Huttenhower Curtis
Landis Jessica N
Myers Chad L
Olszewski Kellen L
Sahi Sauhard
Siemers Nathan O
Troyanskaya Olga G
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background The availability of microarrays measuring thousands of genes simultaneously across hundreds of biological conditions represents an opportunity to understand both individual biological pathways and the integrated workings of the cell. However, translating this amount of data into biological insight remains a daunting task. An important initial step in the analysis of microarray data is clustering of genes with similar behavior. A number of classical techniques are commonly used to perform this task, particularly hierarchical and K-means clustering, and many novel approaches have been suggested recently. While these approaches are useful, they are not without drawbacks; these methods can find clusters in purely random data, and even clusters enriched for biological functions can be skewed towards a small number of processes (e.g. ribosomes). Results We developed Nearest Neighbor Networks (NNN), a graph-based algorithm to generate clusters of genes with similar expression profiles. This method produces clusters based on overlapping cliques within an interaction network generated from mutual nearest neighborhoods. This focus on nearest neighbors rather than on absolute distance measures allows us to capture clusters with high connectivity even when they are spatially separated, and requiring mutual nearest neighbors allows genes with no sufficiently similar partners to remain unclustered. We compared the clusters generated by NNN with those generated by eight other clustering methods. NNN was particularly successful at generating functionally coherent clusters with high precision, and these clusters generally represented a much broader selection of biological processes than those recovered by other methods. Conclusion The Nearest Neighbor Networks algorithm is a valuable clustering method that effectively groups genes that are likely to be functionally related. It is particularly attractive due to its simplicity, its success in the analysis of large datasets, and its ability to span a wide range of biological functions with high precision.</p

The Jackson Laboratory: The Mouseion at the JAXlibrary

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Aneuploidy prediction and tumor classification with heterogeneous hidden conditional random fields

Author: Albertson
Albertson
Beitzinger
Brown
Chin
E. M. Airoldi
Han
Heim
Huang
Jonsson
Nag
O. G. Troyanskaya
R. E. Schapire
Rocke
Rueda
Shah
Snijders
V. Dumeaux
van Beers
Wessels
Yi
Z. Barutcuoglu
Publication venue: Oxford University Press
Publication date: 01/01/2008
Field of study

Motivation: The heterogeneity of cancer cannot always be recognized by tumor morphology, but may be reflected by the underlying genetic aberrations. Array comparative genome hybridization (array-CGH) methods provide high-throughput data on genetic copy numbers, but determining the clinically relevant copy number changes remains a challenge. Conventional classification methods for linking recurrent alterations to clinical outcome ignore sequential correlations in selecting relevant features. Conversely, existing sequence classification methods can only model overall copy number instability, without regard to any particular position in the genome

Crossref

Harvard University - DASH

PubMed Central

Munin - Open Research Archive

NORA - Norwegian Open Research Archives

Nucleosome-coupled expression differences in closely-related species

Author: AL Olins
AM Tsankov
BE Bernstein
C Koch
Corey Nislow
CT Harbison
DE Schones
E Segal
EA Sekinger
F Ozsolak
G Badis
G Zhu
GC Yuan
GJ Hogan
H Li
I Tirosh
IP Ioshikhes
JD Hughes
KA Zawadzki
Kyle Tsui
L Bai
Maitreya J Dunham
Marinella Gebbia
N Kaplan
N Morohashi
O Elemento
O Troyanskaya
OC Martin
Olga G Troyanskaya
P Cliften
P Clifton
RD Kornberg
S Mahony
S Shivaswamy
S Washietl
SW Doniger
T Owen-Hughes
T Pramila
Victoria Yao
W Lee
X Liu
Y Guan
Y Zhang
Yuanfang Guan
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Genome-wide nucleosome occupancy is negatively related to the average level of transcription factor motif binding based on studies in yeast and several other model organisms. The degree to which nucleosome-motif interactions relate to phenotypic changes across species is, however, unknown. Results We address this challenge by generating nucleosome positioning and cell cycle expression data for <it>Saccharomyces bayanus </it>and show that differences in nucleosome occupancy reflect cell cycle expression divergence between two yeast species, <it>S. bayanus </it>and <it>S. cerevisiae</it>. Specifically, genes with nucleosome-depleted MBP1 motifs upstream of their coding sequence show periodic expression during the cell cycle, whereas genes with nucleosome-shielded motifs do not. In addition, conserved cell cycle regulatory motifs across these two species are more nucleosome-depleted compared to those that are not conserved, suggesting that the degree of conservation of regulatory sites varies, and is reflected by nucleosome occupancy patterns. Finally, many changes in cell cycle gene expression patterns across species can be correlated to changes in nucleosome occupancy on motifs (rather than to the presence or absence of motifs). Conclusions Our observations suggest that alteration of nucleosome occupancy is a previously uncharacterized feature related to the divergence of cell cycle expression between species.</p

University of Toronto Research Repository

Princeton University Open Access Repository

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

The Moore-Penrose Pseudoinverse. A Tutorial Review of the Theory

Author: AN Tikhonov
C Hennig
CW Groetsch
DK Sodickson
EH Moore
G Lohmann
H Ammari
H Andrews
J-Z Wang
João Carlos Alves Barata
L Bedini
Mahir Saleh Hussein
MT Page
NG Gençer
O Bratteli
O Troyanskaya
R Penrose
R Penrose
R Walle Van De
RD Pascual-Marqui
S Atzori
TV Ricci
X Li
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 31/10/2011
Field of study

In the last decades the Moore-Penrose pseudoinverse has found a wide range of applications in many areas of Science and became a useful tool for physicists dealing, for instance, with optimization problems, with data analysis, with the solution of linear integral equations, etc. The existence of such applications alone should attract the interest of students and researchers in the Moore-Penrose pseudoinverse and in related sub jects, like the singular values decomposition theorem for matrices. In this note we present a tutorial review of the theory of the Moore-Penrose pseudoinverse. We present the first definitions and some motivations and, after obtaining some basic results, we center our discussion on the Spectral Theorem and present an algorithmically simple expression for the computation of the Moore-Penrose pseudoinverse of a given matrix. We do not claim originality of the results. We rather intend to present a complete and self-contained tutorial review, useful for those more devoted to applications, for those more theoretically oriented and for those who already have some working knowledge of the sub ject.Comment: 23 page

arXiv.org e-Print Archive

CiteSeerX

Crossref

A Spatio-Temporal Data Imputation Model for Supporting Analytics at the Edge

Author: D Bertsimas
D Stekhoven
G Carpenter
H Cai
J Honaker
J Xing
L Jiang
L Kim
M Satyanarayanan
N Jiang
NC Guan
O Troyanskaya
P Schmitt
PJ Escamilla-Ambrosio
R Little
R Mazumder
S Buuren
S Oba
T Bo
T Raghunathan
X Wang
Y He
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

Current applications developed for the Internet of Things (IoT) usually involve the processing of collected data for delivering analytics and support efficient decision making. The basis for any processing mechanism is data analysis, usually having as an outcome responses in various analytics queries defined by end users or applications. However, as already noted in the respective literature, data analysis cannot be efficient when missing values are present. The research community has already proposed various missing data imputation methods paying more attention of the statistical aspect of the problem. In this paper, we study the problem and propose a method that combines machine learning and a consensus scheme. We focus on the clustering of the IoT devices assuming they observe the same phenomenon and report the collected data to the edge infrastructure. Through a sliding window approach, we try to detect IoT nodes that report similar contextual values to edge nodes and base on them to deliver the replacement value for missing data. We provide the description of our model together with results retrieved by an extensive set of simulations on top of real data. Our aim is to reveal the potentials of the proposed scheme and place it in the respective literature

Crossref

Enlighten

A Regression-based K nearest neighbor algorithm for gene function prediction from heterogeneous data

Author: A Enright
A Gavin
A Grigoriev
A Hoerl
AJ Dobson
EG WS Cleveland
G GH
GRG Lanckriet
H Ge
M Deng
M Eisen
M Fellenberg
MPS Brown
O Troyanskaya
P Liang
P Pavlidis
P Pavlidis
R Overbeek
R Tibshirani
Walter L Ruzzo
WS Noble
Y Zheng
Zizhen Yao
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: As a variety of functional genomic and proteomic techniques become available, there is an increasing need for functional analysis methodologies that integrate heterogeneous data sources. METHODS: In this paper, we address this issue by proposing a general framework for gene function prediction based on the k-nearest-neighbor (KNN) algorithm. The choice of KNN is motivated by its simplicity, flexibility to incorporate different data types and adaptability to irregular feature spaces. A weakness of traditional KNN methods, especially when handling heterogeneous data, is that performance is subject to the often ad hoc choice of similarity metric. To address this weakness, we apply regression methods to infer a similarity metric as a weighted combination of a set of base similarity measures, which helps to locate the neighbors that are most likely to be in the same class as the target gene. We also suggest a novel voting scheme to generate confidence scores that estimate the accuracy of predictions. The method gracefully extends to multi-way classification problems. RESULTS: We apply this technique to gene function prediction according to three well-known Escherichia coli classification schemes suggested by biologists, using information derived from microarray and genome sequencing data. We demonstrate that our algorithm dramatically outperforms the naive KNN methods and is competitive with support vector machine (SVM) algorithms for integrating heterogenous data. We also show that by combining different data sources, prediction accuracy can improve significantly. CONCLUSION: Our extension of KNN with automatic feature weighting, multi-class prediction, and probabilistic inference, enhance prediction accuracy significantly while remaining efficient, intuitive and flexible. This general framework can also be applied to similar classification problems involving heterogeneous datasets

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Latent physiological factors of complex human diseases revealed by independent component analysis of clinarrays

Author: A Falco
A Frigyesi
A Hyvarinen
AJ Hanley
Atul J Butte
BP O'Sullivan
CS Haworth
David P Chen
DP Chen
DP Chen
DP Chen
DW Seldin
FT Fischbach
G Mastella
G Sterner
HJ Gould
HL Young
I Tillie-Leblond
JK Perloff
Joel T Dudley
M Leshin
M Leslie
O Kordonouri
O Troyanskaya
P Comon
R EH
RM Aris
S Raychaudhuri
SA Saidi
TA Hillier
TA Wren
V Kiviniemi
W Liebermeister
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Diagnosis and treatment of patients in the clinical setting is often driven by known symptomatic factors that distinguish one particular condition from another. Treatment based on noticeable symptoms, however, is limited to the types of clinical biomarkers collected, and is prone to overlooking dysfunctions in physiological factors not easily evident to medical practitioners. We used a vector-based representation of patient clinical biomarkers, or clinarrays, to search for latent physiological factors that underlie human diseases directly from clinical laboratory data. Knowledge of these factors could be used to improve assessment of disease severity and help to refine strategies for diagnosis and monitoring disease progression. Results Applying Independent Component Analysis on clinarrays built from patient laboratory measurements revealed both known and novel concomitant physiological factors for asthma, types 1 and 2 diabetes, cystic fibrosis, and Duchenne muscular dystrophy. Serum sodium was found to be the most significant factor for both type 1 and type 2 diabetes, and was also significant in asthma. TSH3, a measure of thyroid function, and blood urea nitrogen, indicative of kidney function, were factors unique to type 1 diabetes respective to type 2 diabetes. Platelet count was significant across all the diseases analyzed. Conclusions The results demonstrate that large-scale analyses of clinical biomarkers using unsupervised methods can offer novel insights into the pathophysiological basis of human disease, and suggest novel clinical utility of established laboratory measurements.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

eScholarship - University of California